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Abstract. We give the first 0(-A=)-error online algorithm for recon¬ 
structing noisy statistical databases, where T is the number of (online) 
sample queries received. The algorithm, which requires only O(logT) 
memory, aims to learn a hidden database-vector w* £ R D in order to 
accurately answer a stream of queries regarding the hidden database, 
which arrive in an online fashion from some unknown distribution T>. 
We assume the distribution V is defined on the neighborhood of a low¬ 
dimensional manifold. The presented algorithm runs in O(dD )-time per 
query, where d is the dimensionality of the query-space. Contrary to the 
classical setting, there is no separate training set that is used by the 
algorithm to learn the database — the stream on which the algorithm 
will be evaluated must also be used to learn the database-vector. The 
algorithm only has access to a binary oracle O that answers whether 
a particular linear function of the database-vector plus random noise is 
larger than a threshold, which is specified by the algorithm. We note 
that we allow for a significant O(D) amount of noise to be added while 
other works focused on the low noise o(vT))-setting. For a stream of T 
queries our algorithm achieves an average error 0{-^=) by filtering out 
random noise, adapting threshold values given to the oracle based on its 
previous answers and, as a consequence, recovering with high precision a 
projection of a database-vector w* onto the manifold defining the query- 
space. 


1 Introduction 


Protecting databases that contain sensitive information has become increasingly 
important due to its crucial practical applications, such as the disclosure of sen¬ 
sitive health data. Privacy preservation plays a key role in this setting since 
such data is often published in anonymized form so it can be used by analysts 
and researchers. Several mechanisms have been proposed, such as differential 
privacy, that allow for learning from a database while preserving privacy guar¬ 
antees (|1|2|3|4|5|). At the other extreme are many results showing how database 
privacy can be compromised by an adversary who is able to collect perturbated 
answers to a large number of queries regarding the database f |6I7I8I9I10I ). Ex¬ 
isting results related to breaking the privacy of a database have several key 
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limitations. For example, most assume that each query is represented by a vec¬ 
tor q of D independent entries taken from some fixed distribution (such as the 
Gaussian distribution or a specific discrete distribution), and that this structure 
is known to the privacy-breaking algorithm. Also, most methods learn an ap¬ 
proximation of the unknown database-vector w* that has L 2 error eD for some 
small constant e > 0. Such precision is not sufficient to obtain o(l)-error on the 
stream of T queries for T D, as is the case in our model. Further, the focus 
has typically been on the offline setting, where the adversary first collects all 
the queries, then applies some privacy-breaking algorithm, and finally uses the 
reconstructed database-vector to compute good approximations of the statistics 
he needs. From the machine learning point of view this means that the overall 
protocol for the adversary consists of two distinct phases: a training phase and 
a testing phase. Finally, the memory resources used by privacy-breaking algo¬ 
rithms are typically not analyzed, even though this is a crucial issue for the 
setting considered here, where the number of all the queries q coming in the 
stream may be huge. 

The goal of this paper is to present and analyze a database privacy-breaking 
algorithm for a more realistic setting in which the limitations described above 
are lifted. The entries of the query-vector are not necessarily independent. The 
distribution T> of the query-vector is not known to the adversary. The adversary 
is not able to first learn the database-vector before being evaluated. Our algo¬ 
rithm uses only 0(log(T))-size memory to process the entire stream of T queries 
and therefore is well-suited to the limited resources scenario. To make life of the 
adversary even more difficult, we assume that the database mechanism provides 
only a binary oracle O that answers whether the perturbated value of a dot- 
product between the database-vector w* and the query-vector q is greater than 
a threshold that is specified by the adversary. Thus the algorithm has very lim¬ 
ited access to the database even in the noiseless scenario. Dot-products between 
query-vector and a database-vector are considered in most of the settings ana¬ 
lyzing database privacy-breaking algorithms. Considering this more challenging 
setting, we will show that much less than the noisy answer is needed to carry 
out an effective attack and compromise data privacy. 

In some of the mentioned papers an effort is made to learn a good approxima¬ 
tion of the database vector with a small number of queries that is only linear in 
the size of the database D. We use many more queries but our task is more chal¬ 
lenging - we need much more accurate approximation, and get the information 
only about the sign of the perturbated product as opposed to the perturbated 
product itself. Finally, we are penalized whenever we are making a mistake. Our 
goal is to minimize the average error of the algorithm over a long sequence of 
queries so we need to learn this more accurate approximation very fast. 


In this paper we present the first online algorithm that an adversary can use 
to reconstruct a noisy statistical database protected by a binary oracle O that 
achieves average error 0{^=) on the stream of T queries and operates in logarith¬ 
mic memory. From now on we will call this algorithm a learning algorithm. The 
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learning algorithm is given a set of queries taken from some unknown distribu¬ 
tion T> defined on a neighborhood of the low-dimensional manifold that it needs 
to answer in the order that they arrive (note that the entries of a fixed query 
do not have to be independent). The learning algorithm can use the information 
learned from previously collected queries but cannot wait for other queries to 
learn a more accurate answer. Every received query can be used only once to 
communicate with a database. The database mechanism calculates a perturbated 
answer to the query and passes the result to the binary oracle O. The binary or¬ 
acle uses the threshold provided by the adversary and passes a “Yes/No”-answer 
to him. The error made for a single query is defined as: \z t — w* ■q t \, where q l and 
z t are the query and answer, respectively, provided by the learning algorithm 
in round t. As a byproduct of our methods, we recover with high precision the 
projection of the database-vector w* onto the query-space. Our approximation 
is within 0(-f=) /^-distance from the exact projection. By comparison, most 
of the previous papers focused on approximating/recovering all but at most a 
constant fraction eD of all the entries of w* which is unacceptably inaccurate in 
our learning setting where T D. The assumption that queries are taken from 
a low-dimensional manifold is in perfect agreement with recent development in 
machine learning (see: EL [T2] . [13]). It leads to the conclusion that, as stated 
in m “a lot of data which superficially lie in a very high-dimensional space 
actually have low intristic dimensionality , in the sense of lying close to a 
manifold of dimension d <C D”. Assume that the queries are taken from a truly 
high-dimensional space. Then as long as the number of all queries is polynomial 
in D , the average distances between them are substantial. In this scenario any 
nontrivial noisy setting prevents the adversary from learning anything about the 
database since a single perturbated answer does not give much information and 
the probability that a close enough query will be asked in the future is negligible 
in D. In practice we observe however that noise can be very often filtered out and 
a significant number of queries can give nontrivial information about a database- 
vector w *. In this paper we explain this phenomenon from the theoretical point 
of view. Our algorithm accurately reconstructs the part of the database that re¬ 
gards the lower-dimensional space used for querying. We show that this suffices 
to achieve average 0(-f=)-error on the set of T given queries. In our model, the 
number of queries significantly exceeds the dimensionality of the database, and 
therefore we focus on optimizing our algorithm’s time complexity and accuracy 
as a function of T. Having said that, in most of the formulas derived in the 
paper we will also explicitly give the dependence on other parameters of the 
model such as the dimensionality of the database D and the dimensionality of 
the query-space d. We are mainly interested in the setting: d <C D <C T. If we 
use the O-notation, where the dependency is not explicitly given then we treat 
all missing parameters as constants. 

It should be also emphasized that, contrary to most previous work on recon¬ 
structing databases based on the perturbated statistics, the proposed algorithm 
does not use linear programming and thus gives better theoretical guarantees 
regarding running time than most existing methods. The algorithm uses a sub- 
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routine whose goal is to solve a linear program, however we show this program 
has a closed-form solution. Therefore we do not need to use any techniques 
such as simplex or the ellipsoid method. The algorithm is very fast: it needs 
only 0(dD)-time per query. More detailed analysis of the running time of the 
algorithm as well as memory usage will be given in the Appendix. 

2 Model description and main result 

We will now describe in detail our database access model. We assume that the 
database can be encoded by the database-vector w* £ M. D . For definiteness we 
will consider: w* £ [0,1] for i = 1 Our method can be however used 

in the much more general setting, as long as w* is taken from some fixed ball 
in Lao. Each query can be represented as a vector q = (qq,..., qr>), where: 
0 < qi < 1 and qf + ■ ■ ■ + q^ >0. Queries are taken independently at random 
from the unknown distribution T> (notice that entries of a fixed query do not 
have to be independent). The distribution D is defined on some d-dimensional 
linear subspace U £ K d (d < D). The exact answer to the query is given as 
a = £T_i w*qt. For the t th coming query q 4 the learning algorithm C selects the 
threshold value 6 4 and passes q 4 to the database mechanism A4 which computes 
a 4 = w* • q*. The noisy version a 4 of a 4 as well as 0 4 is passed by M and C to 
the binary oracle O: 



The value O(d 4 ,0 4 ) is then given to C. The learner records this value and 
can also use the information obtained from previously received queries to give 
an answer z f to the query q 4 . However it has only 0(log(T))-memory available. 
Further, for a fixed query the learner only has one-time access to the binary 
oracle O. 

The noise e 4 = a 4 —a 4 is generated independently at random and is of the form 
D£ 1 where £ is some known distribution producing values from some bounded 
range [—u,u\. The boundedness assumption is not crucial. Technically speaking, 
as long as the random variable is not heavy-tailed (which is a standard assump¬ 
tion), our approach works. In fact even this condition is unnecessarily strong. 
This will become obvious later when we describe and analyze our method. 

This setting covers standard scenarios where computing every single product 
in the sum of d terms for w* ■ q 4 gives an independent bounded error. We should 
notice here that in most of the previous papers the magnitude of the noise 
added was of the order o{\/~D) (see: mm). For instance, in [T the authors 
reconstruct a database that agrees with the groundtruth one on all but (2 ca) 2 
entries, where a is a noise magnitude and c > 0 is a constant. Thus, even though 
previous works do not assume that noise was added independently for every 
query, the average error per single product in the dot-product sum was only 
of the magnitude o( - ^)- This assumption significantly narrows the range of 
possible applications. This is no longer the case in our setting, where some mild 
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and reasonable assumptions regarding independence of noise added to different 
queries and low-dimensionality of querying space leads to a model much more 
robust to noise. We will assume that e t do not have singularities, i.e. P(e* = c) = 0 
for any fixed c. 

We need a few more definitions. 


Definition 1. We say that a vector w computed by the learning algorithm e- 
approximates database-vector w* if \LIu(w) — nu(w*)\ oa < e, where Liu {v) stands 
for the projection of v onto d-dimensional querying space U. 

Definition 2 . Let Q be a probability distribution on the unit sphere 5(0,1) in 
L 2 . For a fixed vector q £ S( 0,1) we denote by p^ g the probability that a vector 
x selected according to Q satisfies: q ■ x > cos (9). 


Definition 3. Take a distribution T> from which queries are taken. Assume that 
T> is defined on the d-dimensional space IA with orthonormal basis B. Denote by 
V n the normalized version ofV and by B n the normalized version of B (all vec¬ 
tors rescaled to length 1 in the L^-norm). Then we define: px >,9 = ra.m. qe B n 


The error e q the algorithm is making on each query q is defined as the absolute 
value of the difference between the exact answer to the query and the answer that 
is provided by the algorithm. The average error on the set of queries: q 1 , ...,q T 
is defined as e av = (p, Y^iL 1 e g i - Let us state now main result of this paper. 


Theorem 1. Let q 1 ,...,q T be a stream of query-vectors coming in an online 
fashion from some d-dimensional subspace, where: 0 < q\ < 1 fori = 1,..., d and 
each q f is a nonzero vector. Then there exists an algorithm Alg using 0(log(T))- 
memory, acting according to the protocol defined above, and achieving average 
error: 


with probability p SU cc 
2arcsin (i473)- 


&av — 


0(^=(rDid + VD\°g(T))) 


> 1 - 0 ( 


log (dPT) dlog(dT) 


-7*3- 


-|- £$0 ), where r = 



and (f) = 


We will give this algorithm, called OnlineBisection algorithm, in the next 
section. Notice that cf> is well approximated by ■ To see what the magnitude 
of r is in the worst-case scenario it suffices to analyze the setting where q is 
chosen uniformly at random from the query-space U. 

If this is the case then one can notice that pv,<p is of the order D( 2 ~ dl ° s ^) 
thus r = 0(2 dlog A)y jf however there exists a basis of U such that most of the 
mass of D is concentrated around vectors from the basis then standard analysis 
leads to the po ^ d ^ -lower bound on p, i.e. poly(d)- upper bound on r (where 
poly(d) is a polynomial function of d). 

Theorem [T] implies a corollary regarding the batch version of the algorithm, 
where test and training set are clearly separated (the proof of that corollary will 
be given in the Appendix): 
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Corollary 1. Let wt denote the final hypothesis constructed by the OnlineBi- 
section algorithm after consuming T queries drawn from an unknown distribution 
T>. Then the following inequality holds with probability at least 1 — 0^ log ^f >T ' ) + 

d > °x 3 o T ' > ) .for any future queries q drawn from V: 


Egr^v [|wr ■ q-w* 


■</|] < 


VD\og{T) 

Vt 


In the subsequent sections we will prove Tlieorem[T]and conduct further anal¬ 
ysis of the algorithm. Unless stated otherwise, log denotes the natural logarithm. 

Algorithm 1 - OnlineBisection 

Input: Stream q 1 ,..., q T of T queries, database mechanism M and 
binary oracle O. 

Output: A sequence of answers [w 1 ■ q 1 ,, w T ■ q 4 ), returned online. 

begin 

Choose an orthonormal basis C = {e 1 ,..., e d } of U. 

Let f> = 2arcsin(^). 

Let Zj = y/D\, Nf = 0 and N~ = 0 for i = 1,..., d. 

for t = 1,... ,T do 

Output w approx ■ cf for any w appr0 x = Zie 1 H- V fd,e d , where 

fi G i = 1) • • • j d. 

if |Zi| < l °^^J for i = 1, ...,d continue, 
if 3i* G {1,..., d} such that arccos(e i , ,i ) < <j) then 
Let m = max heXl ,...j d ex d £? =1 fi e *' (-?*)• 

Let M = max /ie i 1: ... i/d6 i d £f =1 h*? ■ 9*- 
Let b = 0{M{q t ),^L). 

If b > 0 update N£ <— + 1, otherwise update 

•\v <- -V, : . - I. 

end 

Let Ap = P(-J^L <s<\M) jNi = N f + N~ and 

AT _ 30 log(T) 

iV cnt — Ap 2 

if Ni > N cr i t for i = 1,..., d then 

Run ShrinkHyperCube(T i ,... ,1^, N +,..., Aj", Nf ,..., N~[). 
Update: N+ <— 0, N~ «— 0 for i = 1,..., d. 

end 

end 

end 


3 The Algorithm 

We will now present an algorithm (Algorithm 1) that achieves theoretical guar¬ 
antees from Theorem[T| Our algorithm, called OnlineBisection , maintains a tuple 
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of intervals (X±, ... ,X d ) which encode a hypercube that contains the database- 
vector w* (projected onto Li) with very high probability. For each coining query- 
vector q * the algorithm outputs an answer w appr0 x ■ (/, where w apP rox is an 
arbitrarily selected vector in the current hypercube. The query-vectors received 
by the algorithm are used to progressively shrink the hypercube. 

As the hypercube shrinks, vector w appr ox e-approximates w* for smaller val¬ 
ues of e. When the hypercube is large the errors made by the algorithm will be 
large, but on the other hand larger hypercubes are easier to shrink since they 
require fewer queries to ensure that hypercube continues to contain w* (with 
very high probability) after shrinking. This observation plays a crucial role in 
establishing upper bounds on the average error made by the algorithm on the 
sequence of T queries. 

After outputting an answer for query-vector q ( , the algorithm checks whether 
q 1 has a large inner product with at least one vector in an orthonormal basis 
C = {e 1 ,..., e d } of U. If so, q ( represents an observation for that basis vector; 
whether it is a positive or negative observation depends on the response of the 
binary oracle O. The threshold given by the algorithm to O is chosen by solving 
the linear program max yg -Hc Q-y for q = q l and q = —q l , where HC is the current 
hypercube. As we will see in Section[A] this linear program is simple enough that 
there is a closed-form expression for its optimal value. So we do not need to use 
the simplex method or any other linear programming tools. 

Algorithm 2 - ShrinkHyperCube 

Input: Ii = [x 1 ,y 1 ],...,l d = [x d ,y d ], N +,..., N+, 

Output: Updated hypercube {1\.... ,I d )■ 

begin 

Let a = §, Ap = P(-^ <£ < ^), Vi = P(£ > w) and 
Ni = N+ + Nr. 
for i = 1 ,,d do 

if N+ > N lPl + then 

| [Vi ~ u(yi - Xi),yi]\ 

else 

| li [xuXi + a(yi - a:,)]; 

end 

end 

end 

The optimal values m and M of the linear programs solved by the OnlineBi- 
section algorithm represent the smallest and largest possible value of the inner 
product of the query-vector and a vector from the current hypercube. The true 
value lies in the interval [m,M], By choosing the average of these two values 
as a threshold for the oracle we are able to effectively shrink direction i*. The 
intuition is that if the query-vector forms an angle a = 0 with this direction and 
there is no noise added then by choosing the average we basically perform stan¬ 
dard binary search for q. Since a is not necessarily 0 but is relatively small (and 
noise is added that perturbates the output), the search is not exactly binary. 
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Instead of two disjoint subintervals of If we get two intervals whose union is If 
but that intersect. Still, each of them is only of a fraction of the length of If and 
that still enables us to significantly shrink each dimension whenever a sufficient 
number of observations have been collected for each basis vector — specifically, 
N cr it observations — by calling the ShrinkHyperCube subroutine (Algorithm 2). 

Every shrinking of the hypercube decreases each edge by a factor a for some 
0 < a < 1. A logarithmic number of shrinkings is needed to ensure that any 
choice of w apP rox in the hypercube will give an error of the order 0{ ^=). Notice 
that N cr n grows with T, which reflects the fact that for smaller hypercubes more 
observations are needed to further shrink the hypercube while preserving the 
property that it contains the database-vector w* with very high probability. This 
is the case since if the hypercube is small we already know a good approximation 
of the database vector so it is harder to find even more accurate one under the 
same level of noise. When the hypercube is small enough (condition: |I,| < 
for i = 1 there is no need to shrink it anymore since each vector taken 

from the hypercube is a precise enough estimate of the database vector. 

Note that choosing an orthonormal basis C = {e 1 ,...»e rf } of U does not 
require the knowledge of the distribution T> from which queries are taken. We 
only assume that queries are from a low-dimensional linear subspace U of d 
dimensions. It suffices to have as {e 1 ,..., e“} some orthonormal basis of that 
linear subspace. There are many state-of-the-art mechanisms (such as PCA) 
that are able to extract such a basis, and thus we will not focus on that, but 
instead assume that such an orthonormal system is already given. Notice that 
in practice those techniques should be applied before our algorithm can be run. 
Since such a preprocessing phase requires sampling from D but does not require 
an access to the database system, we can think about it as a preliminary period, 
where evaluation is not being conducted. 

4 Theoretical analysis 

In this section we prove Theorem [T] We start by introducing several technical 
lemmas. Their proofs will be given in the Appendix. We prove here how those 
lemmas can be combined to obtain our main result. 

We denote: hr = \^t) ■ Tims the stopping condition for shrinking the hy¬ 
percube is of the form: \Lj\ < for i = 1 

We start with the standard concentration result regarding binomial random 
variables. 

Lemma 1. Let Z m = Bin(m,pi), W m = Bin(m,pi + Ap) and pi = mp\. 

Then the following is true: 



m(Ap) 2 


10 


(1) 



™.(4p) 2 
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Definition 41 LetTLC be a d-dimensional hypercube in . We denote byKfTLC) 
the length of its side measured according to the L 2 ~norm (recall that all the sides 
of a hypercube have the same length). 

Next lemma is central for finding an upper bound on the average error made 
by the algorithm. 

Lemma 2. Let (q ±,..., qr) be a sequence of T queries. Let TTCq, ..., TLC S be a 
sequence of d-dimensional hypercubes in It 0 . Assume that l('HCi+ 1 ) < allfHCi) 
for i = 0,..., s — 1 and some 0 < a < 1. Denote 1 (HCq) = L < D and assume 
that s = lo<y log 2 {L\fdh(T)), where h(T) is some function ofT. Assume that 

w* £ LLCq 7dC s . Let £ be a random variable defined on the interval [— u, u) 

for some constant u > 0, with density p continuous at 0, and such that p(0) > 0. 

Define 4> e {i) = IP(— L ^ —— < £ < L " ^ ' 11 for some constant 0 < e < Let 
rrii = C log(T) for some constant C > 0 and let ki = niir for some other 
constant r > 0 and i = 0,..., s. Assume that learning algorithm uses a vector 
Wapprox G T~tCo to answer first ko queries, a vector w approx £ HCi to answer 
next ki queries, etc. Assume also that an algorithm uses a vector w approx £ HC S 
to answer remaining T — Ylo-o ki queries. Then the following is true about the 
cumulative error e cum made by the algorithm: 

e C um = 0(L 2 Didr\og(T)h(T) + ^j). 

In the following lemma we analyze cutting the hypercube according to some 
linear threshold. 

Lemma 3. Let w £ R D , let {i; 1 ,..., v d } be a system of pariwise orthogonal 
vectors such thatv z £ 1 D , || -y* || 2 = L fori = 1,.., d and letWC = {w+Y^^i fi yl : 
fi, ■ ■ ■, fd £ [0,1]} be a d-imensional hypercube. Let e be a unit-length vector in 
L 2 that is parallel to v 1 , i.e. e = ^v 1 . Let z be a unit-length vector satisfying: 
z ■ e > cos (9) for some 0 < 9 < ^. Let 0 < /3 < 1. Define m = min y^piC V ■ 2 
and M = maxyg-^c V ■ z. Let TLCi = {y £ TIC : z ■ y < m + /3(M — m)} and 
TLC r = {y £ TIC : z ■ y > m + fd{M — in)}. Then for e = 8sin(|)-\/d-' 

max e • y — min e ■ y < L(B + e) (3) 

VdUCi y yGHCi 


and 


max e • y — min e ■ y < L( 1 — 6 + e). (4) 

yeHCr y&HCr 

We are ready to prove Theorem [1] assuming that presented lemmas are true. 
Proof. Let L = 2 \J15. Let us notice that the algorithm can be divided into 
s + 1 phases, where in the i th phase ( i = 0,..., s) all the intervals li are of 
length Lor 1 and s = log 2 {L\/dhT). Indeed, whenever the shrinking is 

conducted, the length of each side of the hypercube decreases by a factor A ( see 
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subroutine ShrinkHyperCube) , the initial lengths are 2 y/D and the shrinking 
is not performed anymore if the side of each length is at most —. We will 
call those phases: lst-phase, 2nd-phase, etc. Notice also that the value of the 
parameter N cr i t is constant across a fixed phase since this number changes only 
when ShrinkHyperCube subroutine is performed. Let us denote the value of N cr n 
during the i th phase of the algorithm as rij. Notice that rij = 30 , where 

Api is the value of the parameter Ap of the algorithm used in the i th phase. 
Denote by ki the number of queries that need to be processed in the i th phase 
for i = 0,..., s— 1. Parameter ki is a random variable but we will show later that 
with high probability: ki < nir for i = 0,..., s — 1, where: r = Assume 

now that this is the case. Denote by HCq, ... ,HC S the sequence of hypercubes 
constructed by the algorithm. Assume furthermore that w* £ HCq D - - - fl HC S . 
Again, we have not proved it yet, we will show that this happens with high 
probability later. However we will prove now that under these two assumptions 
we get the average error proposed in the statement of Theorem [lj Notice that 
under these assumptions we can use Lemma [2] with L = 2 y/~D, h(T) = h?, 
= Api, C = 30, nii = Hi. We get the following bound on the cumulative 

error: 

e cum =0(Didr\og(T)h(T) + ^). (5) 


Thus the average error is at most e av < \pu By using the expression h(T ) = 


log(T) 



in the above formula, we obtain the bound from the statement of Theorem 


It remains to prove that our two assumptions are correct with high prob¬ 
ability and find a lower bound on this probability that matches the one from 
the statement of the theorem. We will do it now. Let us focus on the i th phase 
of the algorithm. First we will find an upper bound on the probability that the 
number of queries processed in this phase is greater than ki. Fix a vector e 3 from 
the orthonormal basis C. The probability that a new query q is within angle (f> 
from e J is at least p = px>,<t>, by the definition of pv,<f>- Assume that u, queries 
were constructed. By standard concentration inequalities, such as Azuma’s in¬ 
equality, we can conclude that with probability at least 1 — e~ 2ui ^ at least 
^ of those queries will be within angle 4> from e J . If we take: Ui > ^ i , then we 

conclude that with probability at least 1 — e ~ 2ui at least rii of those queries 
will be within angle cj> from e- 7 . Denote iq = mr, where r > |. We see that the 

_ A 

considered probability is at least 1 — e 2 n *' r . Using the expression on m we get 

p2 

that this probability is at least 1 — e~ 30r ^~ log ( T ). Notice that when m queries 
within angle (f> from a given vector £ C are collected, the j th dimension is 
ready for shrinking. Thus taking union bound over 0(\og(dT)) phases and all 
d dimensions we see that if we take ki = rni , where: r = then with prob¬ 
ability at most dlo ^u T ^ some i th phase of the algorithm for i £ {0,..., s — 1} 
will require more than kj queries. Now let us focus again on the fixed i th phase 
of the algorithm. Assume that ShrinkHyperCube subroutine is being run. Fix 
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some dimension j £ {1,..., d}. We know that, with high probability, at least rii 
queries q that were within angle <fi from the vector e° £ C were collected. Denote 
by w* the j th coordinate of w*. Let Ij = \xj,yj] and assume that w*j £ [xj,yj\. 
Let us assume that the ShrinkHyperCube subroutine replaced Ij = [x.j. y 3 \ by 
Ij. We want to show that with high probability segment Ij is constructed in 
such a way that w* £ Ij. Denote l = yj — Xj and 6 = (a — ^ )l . Notice first 
that if w* £ [(1 — a)l,al] then w* will be in Ij since no matter how Ij is con¬ 
structed, it always contains [(1 — a)l,al]. So let us assume that this is not the 
case. Thus we have either w* £ [xj,Xj + (1 — a)l] or w* £ [yj — (1 — a)l,yj]. 
Let us assume first the former. Consider a query-vector q within angle (j) of 
e J that contributed to IV- . Let us denote by p + the probability of the follow¬ 
ing event I q : for q the oracle O gives answer: “greater than 0”. Observe that 
the total error made by the database mechanism A4 while computing the dot- 
product: w* ■ q is D£. Now notice, that by Lemma [3] and the definition of £, 
probability p + is at most P (D£ > S — el), where: e = 8sin(|)\/d = g. Thus 

we get: p+ < P(£ > ^ e ^ ). Notice that in the i th phase the hypercube un¬ 

der consideration has the side of length exactly a 1 . Thus, since a = |, we get: 

P+ < P(£ > 1 1 p L " ). Let us assume now that w* £ [yj — (1 — a)l,yj]. We 
proceed with the similar analysis as before. We see that the probability P+ of an 

event I q is at least P (D£ > — 6 + el). Thus we obtain: P + > P(£ > — (4 ^ La ). 
But now we see, by Lemma [TJ using: m = Ni, pi = P(£ > p La ) and 
Ap = P( 1 1 p L " < £ < — p L " ) that N+ > Nipi + AiAp j s satisfied if 

tii(Ap ) 2 

Wj £ [xj,Xj + (1 — a)(yj — Xj)\ with probability at most e . Similarly, 

N+ < Nipi + AiAp i s satisfied if if w* £ [yj,yj — (1 — a)(yj — Xj)\ with probabil- 

riiiAp ) 2 

ity at most e to . We can use Lemma since (as it is easy to notice) in the 
i th phase Ap is exactly Ap t = P(— (l l ^ L " < £ < ' ! ^-" ) and p\ is exactly 
P(£ > — p 11 " ). We obtain the following: the probability that there exists i 

such that w* £ HCq IT • • • D T-LCi is at most: 0(X^i=o e ~ ^ 1 °* )■ Substituting in 
that expression the formula on m, and noticing that the number of all the phases 
of the algorithm is logarithmic in T, D and d , we get the bound 0( los ^f >T l ). 
Thus, according to our previous remarks, we conclude that with probability at 
least 1 — 0( los( ^ T) + dlo T S 3 ( o T) ) OnlineBisection algorithm makes an average 
error at most: e av = 0(^(D 3 d^r\og(T)hT + -*j^-))- As mentioned before, we 
complete the proof by using the formula: hr = ■ I 


5 Conclusions 

We presented in this paper the first 0(-^)-error algorithm for database recon- 
stuction in the online setting, using logarithmic memory and 0(dD )-time per 
query. It is designed for the highly challenging, yet very realistic setting, where 
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the answers given by the database are heavily perturbated by a random noise 
and there exists a strong privacy mechanism (binary oracle O) that aims to pro¬ 
tect the database against an adversary attempting to compromise it. We show 
that even if the learning algorithm receives only binary answers on the database 
side and needs to learn database-vector w* with high precision at the same time 
it is being evaluated, it can still achieve very small average error. We assume 
that the query-space is low-dimensional but this fact is needed only to guarantee 
that the term r = from the bound on the error is not exponential in D. 

The low-dimensionality assumption is indispensable here if one wants to achieve 
average error of the order o(l) in a nontrivial setting with random noise. On- 
lineBisection algorithm adapts next threshold values sent to the binary oracle O 
to its previous answers in order to obtain good approximation of the projection 
of a database-vector w* onto a low-dimensional query-space U. 
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A Analysis of the running time of the algorithm and 
memory usage 


We start with the analysis of the running time of OnlineBisection. First we will 
show that the linear program used by the algorithm to determine the threshold 
in each round has a closed form solution. 


Lemma 4. For any query-vector q, = [xi, t/i],... ,Id = [xd,yd] and or¬ 
thonormal basis C = {e 1 ,..., e d j the value 


max 

/lGXi ,■■■■> fd^Zd 


i= 1 


fie 1 ■ q 


if given by 


where J+ = 


opt = ^2 yj eJ 'i + eJ ■ q 

3&J+ j£j- 

{* £ {1,..., d] : e* ■ q > 0} and J- = {i £ (1,..., d} : e l ■ q < 0}. 


Proof. Take some point: cie 1 + • • • + c^e^, where: Xi < Ci < yi for i = 1,..., d. 
For j £ J + the following is true: Cje J ■ q < yje J ■ q , since: Cj < yj and e J ■ q > 0. 
Similarly, for j £ J _ we have: c^e- 7 • q < Xje J ' • q, again by the definition of J-. 
Combining these inequalities we get that for every point v in the hypercube HC 
induced by I\,....I r j and C the following is true: v ■ q < opt. Besides clearly 
there exists v* £ HC such that: v* ■ q = opt. 1 

Now let us fix a query q. It is easy to notice that q is being processed by the 
algorithm in O(dD) time. Indeed, a single query requires updating 0{d) variables 
of the form Nf, N~ and computing the closed-form solution given in LemmaUin 
O(dD) time. Computing dot product of the query with the given approximation 
of the database vector clearly takes O(D) time. Thus OnlineBisection runs in 
the O(dD)- time per query. Notice that OnlineBisection algorithm does not store 
any nontrivial data structures, only segments: I\.... ,2^, counts: Nf, N~ for i = 
1,..., d and a constant number of other variables. The counts can be represented 
by 0(log(T))-digit numbers thus we conclude that OnlineBisection runs in the 
0(log(T) )-memory. 


B Proof of Lemma |T] 

Proof. The proof follows from standard concentration inequalities. Let <5i, <52 > 
0. Note that E{Z m ) < mp\ and E(W m ) > mp± + mAp. Denote p .2 = E(W m ). 

5 2 

Note that by Chernoff’s inequality we have: P(Z m > (1 + £>i)^i) < e _2+i i /il . 
Similarly, P(W m < (1 - S 2 )p 2 ) < e ~^ 2 . Take: ft = ^ d 2 = 

_ 1 mAp 

Using these values of <Si and S 2 , we obtain: P(Z m > pi + tuAE'j < e 1+ ^T 2 . 
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1 m/\p 

Similarly, P (W m < m + < P (W m < p 2 - < e ^ 2 . Notice 

that 61,62 > 4r (the latter inequality holds because obviously: ^2 < m). Thus 

yt m(zAp) 2 m(z^p) 2 

we get: P(Z m > pn + 2^E) < e _2 < 4 +^> and P(W m < m + 2^E) < e _2 ( 4 +^). 
Since Ap < 1, the proof is completed. 1 


C Proof of Lemma [2] 


Proof. Note first that for any ci-dimensional hypercube T~LC £ R D of side length 
l, two vectors: w 1 , w 2 £ TLC and a vector q = {q \,..., qp) such that: qt = 1 for 
i = 1,..., d the following is true: In; 1 • q — w 2 ■ q\ < ly/dD. This comes from 
the fact that: ||ud — xe 2 1 |2 < iVd, ||g ||2 < VD and Cauchy-Schwarz inequal¬ 
ity. Thus we see that the cumulative error e\ um made by the algorithm for the 
first Yli=oki queries satisfies: ej um < Ei=o kiLa l VdD < Ly/dDrY^i-i rriia 1 . 
Therefore we have: e]. um < CL\/dDr\og{T)Y^ i=0 -^jj)- We can write: e]. urn < 
CLy/dDr log(T) E;=o + CLyfdDr log(T) Y^Ut+i where t is the small¬ 
est index such that p(x) > for x £ [— ^-]. Since p is continuous at 0, t 

is well-defined. Notice that t does not depend on d, D and T, but only on the 
random variable £ and constant a. Observe that CLy/dDr log(T) ^(i) — 

CLy/dDr log(T) ’ where the last inequality follows im¬ 

mediately from the definition of t (density p on the interval considered in the 
definition of <j> £ (t) is at least thus the related probability is at least: the 


length of that interval times ^=-, i.e.: <j> £ (t) > - — j y — ' ■). Therefore the 

considered expression is of the order 0(Ly/dDir log(T)). Now let us focus on 
the expression: 1Z = CLy/ dDr log(T) 1 -zrns- From the definition of t we 


=t +1 

get: TZ < C Ly/dDi r\og(T)II , where U = 

A 

p 2 ( 0 ) 


0 a 2 , a % 2p2(0) - Therefore 1Z < 

5 


Ei =1 <*"*■ Thus we have: 1Z < ; _g_((i)^ + i _ 

!) < 32G ^d r a ]° a S s (T) ■ Using the formula on a, we get: TZ < ■ 

Combining this upper bound on 7 Z with the upper bound on the previous ex¬ 
pression, we obtain: A cum = 0(L 2 D^dr log(T)h(T)). Next let us focus on the 
cumulative error e 2 um made by the algorithm for the remaining T — Ei=o 
queries. By the definition of s we know that l(/HC s ) < . This implies that 

for any w £ TiCs we have: ||u> — u >*\\2 < j ^y. Thus clearly for any query com¬ 
ing in this phase the learning algorithm makes an error at most (again, 
by Cauchy-Schwarz inequality) and we have at most T queries in this phase. 
Therefore e 2 um = O(j^T). That completes the entire proof. | 
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D Proof of Lemma [3] 

Proof. Denote: 77 = e — z. Note that U 77 U 2 < 2sin(|). Take first y G TLCi. We 
have: m < z-y < m+f3(M—m). Thus m+rj-y < e-y < m+/3(M—m)+r]-y. Define: 
rh = min v ^uc U ■ e and M = max-y^uc U ■ e. Notice that: |m — m| < 2 sin(|)L-\/d 
and \M — M\ < 2sin(| )L\fd. This follows directly from the fact that: ||y ||2 < 
LVd, ||? 7|| 2 < 2 sin(|) and Cauchy-Schwarz inequality. Thus we obtain: rh — 
2sin(|)L\/d + 77 - y <e-y< m + 2 sin(|)i\/d + 0(M — rh + 4 sin(|)Lv / d) + 77 • y. 
Since, from the definition of M, rh and HC we have: M — rh = L, we obtain: 
tti — 2 sin(|)L-\/d + rj-y<e-y<m + 2 sin( \)L\fd + fi(L + 4sin(|)Lv / d) + 77 -y. 
Therefore max^g-Hd e-y — min^g^c! e • y < L(/3 + 8 sin(|)\/d)- This completes 
the proof of inequality [3] The proof of inequality [I] is completely analogous. 

I 


E Online-to-batch conversion 


Throughout the paper we have considered the challenging online scenario, where 
the algorithm both learns and is evaluated on a single set of streaming queries. 
However, we note that the OnlineBisection algorithm also works well in the 
batch setting, i.e. when there is a separate train and test phase. We prove here 
Corollary [Q that for clarity we state once more: 

Corollary El Let wt denote the final hypothesis constructed by the OnlineBi¬ 
section algorithm after consuming T queries drawn from an unknown distribution 
V. Then the following inequality holds with probability at least 1 — 0 ^ los ^pD _j_ 

d ) foT any future queries q drawn from V: 


E g ~x>[|u>T ■ q-w* -q\\ < 


VDlogiT) 

VT 


Proof. This simply follows from the fact that, as argued in the proof of Theo¬ 
rem [ 1 | w* € T~LC S with at least the probability indicated in the statement of this 
corollary. Furthermore, by definition of the algorithm, we have wt G T~tC s and 
the length of the side of the hypercube < log {T)/y/T. Thus, with at least 

the probability indicated, \wt • <7 — w* ■ q\ < ||wt — w* || 21 |<?|| 2 < log j, T ^ \[T). 






